This paper proposes a software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors. CALiBeR can be used by embedded systemdesigners to explore different code optimization alternatives, that is, high-qualitycustomized retiming solutions for desired throughput and program memory sizerequirements, while minimizing register pressure. An extensive set of experimentalresults is presented, demonstrating that our algorithm compares favorablywith one of the best state-of-the-art algorithms, achieving up to 50% improvementin performance and up to 47% improvement in register requirements. In orderto empirically assess the effectiveness of clustering for high ILP applications,additional experiments are presented contrasting the performance achievedby software pipelined kernels executing on clustered and on centralized machines. 相似文献
The Earth Simulator (ES), developed under the Japanese government’s initiative “Earth Simulator project”, is a highly parallel vector supercomputer system. In this paper, an overview of ES, its architectural features, hardware technology and the result of performance evaluation are described.
In May 2002, the ES was acknowledged to be the most powerful computer in the world: 35.86 teraflop/s for the LINPACK HPC benchmark and 26.58 teraflop/s for an atmospheric general circulation code (AFES). Such a remarkable performance may be attributed to the following three architectural features; vector processor, shared-memory and high-bandwidth non-blocking interconnection crossbar network.
The ES consists of 640 processor nodes (PN) and an interconnection network (IN), which are housed in 320 PN cabinets and 65 IN cabinets. The ES is installed in a specially designed building, 65 m long, 50 m wide and 17 m high. In order to accomplish this advanced system, many kinds of hardware technologies have been developed, such as a high-density and high-frequency LSI, a high-frequency signal transmission, a high-density packaging, and a high-efficiency cooling and power supply system with low noise so as to reduce whole volume of the ES and total power consumption.
For highly parallel processing, a special synchronization means connecting all nodes, Global Barrier Counter (GBC), has been introduced. 相似文献